TP 3 :

Group : Alexandre BOISTARD, William ROCHE, Ethan TRENTIN

First, let's implement a function that simulates one path of a Hawkes process with a generic decreasing kernel using the thinning algorithm.

For both the thinning and the branching algorithm, we will require the definition of a generic decreasing kernel. We will implement a single exponential kernel in the following function.

The implementation of the Hawkes process using the thinning algorithm uses the methodology described in the slide 11 of the lesson on Hawkes processes.

Next, we are going to write a function that simulates one path of a Hawkes process with a generic decreasing kernel (again we will use the exponential kernel) using the branching algorithm.

To do so, we need first to implement the simulation of a non-homogeneous poisson process. We will use the code from lab 2 implemented with a global thinning algorithm.

Now, to visualize the results of our algorithms, we are going to plot the Hawkes processes resulting from both the thinning and the branching algorithms.

Let's check that our thinning and branching algorithms correctly return the times of the Hawkes process generated.

And now, we will visualize the Hawkes process created thanks to the thinning and branching algorithm.

Comments : These plots display the simulation of one path of a Hawkes process with first the thinning and then the branching algorithm. For both, we plotted the intensity of the process, the times of occurrences of the events (with a triangle symbol) and the intensity maximum as a piecewise function made thanks to local maxima values. We can see that those figures have a similar shape to the ones obtained in class.

The events in a Hawkes process can trigger subsequent events, creating a self-exciting process with clustering of events over time. We can find such clusters of events in our plots.

When an event occurs, the intensity of the process increases immediatly after. This again is a proof of the self-exciting nature of the process and is modeled here by the exponential kernel that we use : close to the event, the intensity spikes and then decays exponentially over time. This reflects sudden jumps of the process.

Therefore, the plot effectively demonstrates the dynamics of a Hawkes process with an exponential kernel : we can see the self-exciting nature of the process and the exponential decay after spikes in the intensity due to the trigerring of events.

Comments : When increasing the time horizon, many more events are displayed, making it sometimes harder to visualize the clustering of events. But the same caracteristics of the Hawkes process can be found, just less visible due to the time scale and the many spikes in intensity.

Comments : those plots display the QQ-plot of the interval times of the Hawkes processes generated thanks to the thinning and the branching algorithms. It compares the empirical quantiles of the distribution vs the theoretical quantiles of the exponential distribution (that the interval times are supposed to follow).

We can see that the thinning algorithm seems to generate better exponentially distributed interval times. Indeed, the points are closer to the theoretical line. However, this generally confirms that we succeeded in generating Hawkes process (even if the tails of the interval times are fatter than expected in the branching algorithm).

Let's focus now on the properties of Hawkes MLE estimates.

Question 1 : Properties of Hawkes MLE estimates. Check that MLE estimators computed with the Hawkes library on samples simulated by our simulators exhibit expected statistical properties.

We will have a small problem of notation in the future as the library uses $\mu$ instead of $\lambda_0$ as we have done since the beginning. So, we will need to be careful and handle this issue in our functions.

To check that MLE estimators exhibit expected statistical properties, we need to check three different properties. We will denote $\hat{\theta}_T = (\hat{\lambda}_0, \hat{\alpha}_j, \hat{\beta}_j)$ the MLE estimator at time T :

First, let's focus on the consistency !

Comments : These plots show the consistency of the parameters of the MLE estimates when the time horizon increases. We can see that globally, the convergence of the parameters towards their true value is visible as T increases to $\infty$.

We will try to display the consistency on multiple paths and check the asymptotical efficiency in the meantime (as multiple paths will allow the computation of standard deviations). Doing an average will reduce variability and improve statistical stability. But it will still be possible to observe some fluctuations.

Comments : Therefore, by displaying the consistency on multiple paths, we can conclude that the pattern seen on the one path simulation consistency is still valid : the branching algorithm is better in terms of convergence than the thinning algorithm. Indeed, the branching algorithm, on the contrary has a more structured approach and models the event dependencies more explicitely, ensuring a better convergence.

We added a standard deviation plot in order to visualize the convergence of the three parameters and check the asymptotical efficiency property : the standard deviation decreases in almost $O(T^{-0.2})/O(T^{-0.3})$ for the thinning algorithm and a bit faster for the branching one. The exponent is calculated for the last points of the time array, therefore for the biggest time horizons. This decrease means that the variance is decreasing in $O(T^{-0.4})/O(T^{-0.5})$. For it to be considered asymptotical efficient, it should have decreased in $O(\frac{1}{T})$ but the few points used increases variability and maybe overestimated it. Increasing num_paths should stabilize the plots but it is really taking too much time. Formally, the estimators would need to asymptotically reach the lower bound of the variance of an estimate, which is the Cramer-Rao lower bound. But the computation of the Cramer-Rao bound requires the Fisher Information matrix derivated from the expression of the likelihood. Here, we only checked that the variance decrease and asymptotically at a rate of $O(\frac{1}{\sqrt{T}})$.

Now, we are going to study the asymptotic normality property that the MLE estimator is supposed to follow.

Comments : Those plots display the distribution of the three parameters using num_paths and ffor multiple time horizons values, for both thinning and branching algorithm. We have fit a normal distribution on top of the empirical one in order to check if the property of asymptotical normality is valid. Based on what we can observe, the more T increases and the better the normal distribution fits the empirical data, for all parameters. The branching algorithm seems to perform better in that sense. On top of this, we added a QQ-plot as a visual goodness-of-fit of the normal distribution for the parameters. The theoretical quantiles used are the ones of the normal distribution and the empirical ones, of our simulated data. We can also see a tendancy to fit better the theoretical line when the time horizon increases, but for QQ-plot, the more points the better so a general conclusion can be hard to draw in that case.

To have better results, we should increase the num_path to have more points on the distribution and also increase the time horizon, but both are slowing down our algorithms a lot.

Now, we are going to study deeper the computational cost of the two algorithms, and compare them with the implementation given by the library Hawkes.

Question 2 : Computational cost of Hawkes simulators. Compare for an exponential kernel the computational cost of your thinning algorithm, your branching algorithm, and the simulation of the Hawkes library. Estimate the complexity of these algorithm w.r.t. the horizon. Explain.

Comments : We can see that the Hawkes library implementation is the fastest. Then, our thinning algorithm is faster than the branching one (almost ten times) as we exploit the memoryless property of the exponential kernel and simplify the computation by saving part of the exponential kernel calculation.

But this is only a comparison of the computational cost of the three methods. Now, let's try to evaluate the complexity of these algorithms with respect to the time horizon.

Comments : This plot of the execution time in function of the time horizon shows that the thinning and the branching algorithms have a temporal complexity of $O(T^2)$ whereas the algorithm from the Hawkes library has a temporal complexity of $O(T)$. It appears on the plot that our intermediary conclusion that the branching algorithm was ten times slower than the thinning one is still valid. But their complexity both have the same dependency with respect to the time horizon as they increase with the square of it. To be more precise, we will try to get the exposant of the powerlaw of the three algorithms' complexity.

Comments : Our observation on the plot is now confirmed. The exponent of the powerlaw for the library algorithm is really close to one and shows the complexity in $O(T^{1.02})$, while the exponent for the branching algorithm is really close to $O(T^2)$, in $O(T^{1.97})$. The thinning algorithm's computation cost is a bit less than the branching one, as it is in $O(T^{1.88})$.

The structure of the branching algorithm, generating new non-homogeneous poisson process for each point, until there is no point left, explains its complexity. Whereas the use of the memoryless structure of the exponential kernel explains that the algorithm is faster.

Question 3 : A Hawkes process for trades. Is a Hawkes process a good model for the time dynamics of the trades reported in your dataset ? Use statistical arguments to support your answers.

To check if the Hawkes process is a good model for the time dynamics of the trades reported in our dataset, we need to import the data, and estimate the parameters of Hawkes processes that would fit the data. Then, we can use some statistical arguments to determine whether or not this type of process is well-suited for the data we have.

Comments : We displayed the parameters of Hawkes processes that would fit our data. But we have a problem. As our data is in nanoseconds, then a beta of $10^{−8}$ means that significant decay only occurs over time intervals on the order of $\frac{1}{\beta} \approx 10^8$ nanoseconds (around 0.1 seconds). As the events occur in rapid succession (much less than 0.1 seconds apart), then the kernel effectively acts as if it were constant over the observed intervals : indeed the exponential of the kernel would be : $e^{-\beta t} \approx 1$. Therefore, it seems to fail into capturing the data for almost all days except one. Let's deep dive into this day.

We are going to try to display some statistical properties to check how the Hawkes process suits the data.

Comments : This plot is a QQ-plot of the simulated distribution with the estimated parameters versus the empirical distribution of our data, for the choosen day. We can see that visually, the empirical distribution fails to be explained by the simulated one, which is the Hawkes process corresponding to the parameters obtained by estimation on the empirical data. Therefore, we could argue that this is not a good model for our data. But let's try to get a more quantitative analysis.

Now, and to finish let's try to use the data of one hour and see if the Hawkes process models it better.

Comments : This plot is a QQ-plot of the simulated distribution with the estimated parameters versus the empirical distribution of our data, for the choosen hour. We can see that visually, the empirical distribution is better explained by the simulated one, which is the Hawkes process corresponding to the parameters obtained by estimation on the empirical data. Therefore, we could argue that this is a better model for our data. Let's try to get a more quantitative analysis.

Comments : We can see that the parameters estimated for the hour data are better than the ones estimated for the whole day. The branching ratio is closer to 1 and the AIC is lower. The log-likelihood is also higher. This means that the model fits better the data for the hour than for the whole day. Therefore, we could argue that the Hawkes process is a good model for the time dynamics of the trades reported in our dataset, but only for short time intervals. For longer time intervals, the Hawkes process is not a good model for the data.